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Abstract 

The relation between cooperativity of protein folding and the Random-Field 
Ising Model (RFIM) is established. Generalization of the Imry-Ma argument 
predicts cooperative folding transition for small heterogeneity of the inter- 
actions stabilizing the native structure. Monte Carlo simulation of a lattice 
model shows that starting from some finite heterogeneity folding transition is 

not cooperative and involves formation of domains. 
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The statistical mechanics of protein folding begins from the pioneering work of Go and 
coworkers In this study a simple lattice model able to fold into a unique 3D structure 
was introduced. It has been demonstrated that the folding transition in this model is 
of the first order or, in the terminology of protein science, cooperative. 

The Hamiltonian of the model can be written as follows 

H=-b^A(^,ij)A(r i ,r j ) (1) 

i<j 

where indexes i and j numerate monomers along a protein chain, Ti is spatial lattice co- 
ordinates of monomer i in a current conformation, and r° is spatial lattice coordinates of 
monomer i in some target quenched conformation. Only self-avoiding conformations are 
allowed. A(rj,Tj) = 1, if monomers i and j are in contact, that is they are neighbors on a 
lattice, and A(rj,rj) = 0, otherwise. 

The meaning of the Hamiltonian is simple; the monomers i and j attract each other 
with the energy constant b > if they are in contact in the target conformation, otherwise 
they do not interact. It is clear that the ground state of the chain corresponds to = 
r°, that is the target conformation has the lowest energy. Thus, the target conformation 
should be associated with the native state of a protein. Correspondingly, the contacts 
present in the target conformations will be referred to as to native contacts. Simulation 
of the model by means of Monte Carlo on a lattice showed that starting from random 
unfolded conformations a chain is able to fold spontaneously into the target conformations 
0. Moreover, upon change in the parameter b a first order transition is observed between 
unfolded state corresponding to a number of coil conformations with a few contacts and 
the folded state corresponding to fluctuations around the target conformations. Thus, this 
simple model demonstrates a behavior characteristic for small globular proteins. 

There are two main approximations in the model. First, all the interactions between the 
monomers not forming a contact in the native conformation are neglected. This approxima- 
tion is based on the assumption that protein amino acid sequences are evolutionary selected 
to provide a low energy for the native conformations. It has been shown that low energy 
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of the native conformation is a necessary and sufficient condition for a rapid folding to a 
stable structure It is expected that the sequences selected by the requirement to 

have low energy in the native conformation exhibit a first order phase transition between the 
unfolded state with only a few native contacts in it and the folded state close to the native 
conformations. The non-native contacts contribute substantially to the unfolded state only 
determining the compactness of the unfolded state. If non-native contacts are attractive 
enough on average, then the unfolded state is a compact globule, otherwise it is a coil state. 

Second approximation in the model is that all the native contacts are assumed to be of 
the same strength. In the present letter we relax this approximation and study the effect 
of heterogeneity of the native contacts on the folding transition. It appears that the folding 
transition for small heterogeneity of the native interactions is of first order, but starting 
from some finite heterogeneity folding transition is not cooperative and involves formation 
of domains. 

The Hamiltonian we study is the following 

H=- y £b i jAfl i tyA(r i ,r j ) (2) 

where bij are independent random quenched parameters with the same Gaussian distribution 
with the mean b and the dispersion 5. If 5 = 0, then we have the original model described 
by the Hamiltonian (1). It was mentioned above that in this case there is first order folding 
transition. What happens if S ^ 0? 

To approach this problem one can consider the following analogy between the studied 
system and the RFIM f?| . Each native contact between monomers % and j can be mapped 
onto a spin Sk in the RFIM so that Sk = 2A(rj,rj) — 1. As a result of this mapping, 
the folded state corresponds to ferromagnetic oder in the up direction, and the unfolded 
state corresponds to ferromagnetic oder in the down direction. Accordingly, the attraction 
parameters b^ corresponds to a local external magnetic field hk- Finally, the counterpart for 
the exchange interactions between spins responsible for the ferromagnetic order is established 
by the fact that formation of a native contact makes easier formation of another neighboring 
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native contacts. The effective interaction between neighboring contacts results in free energy 
cost of interface between the folded and unfolded phases. This free energy cost corresponds 
to the surface energy of a domain wall in the RFIM. 

In addition to the effective interaction between the native contacts which are neighbors 
in the native structure there is another kind of interactions due to polymeric structure of 
the system. Indeed, formation of the native contact between monomers i and j with i < j 
makes more favorable formation of the native contact between monomers i\ and ji if i± > i 
and j\ < j. It should be mentioned that the polymeric effect results in effective interactions 
between contacts which may be distant from each other in the native structure. 

The described analogy between the present model and the RFIM brings the idea to 
apply the domain argument of Imry and Ma || to analyze the effect of randomness of by on 
the thermodynamics of protein folding. First, we consider the stability of the folded state 
with respect to melting of some small part of the native structure. The average attraction 
b between interacting monomers should be chosen so that the unfolded and folded states 
have equal stability, that is the system is at transition point. If R is the size of the melted 
domain, then the surface energy cost is of order R d ~ x in d dimensions, and the gain in the 
energy is of order SR d ^ 2 . The latter gain is due to the fact that statistically the contacts 
from the melted domain are less stable than the native contacts on average. 

The surface energy cost and the statistical gain in the free energy are similar to those 
in the RFIM. In addition, there is entropic cost due to polymeric bonds. A domain of size 
R consists of a number of subchains with a typical number of monomers in one subchain 
g (Fig. la), g can be estimated from the condition that the ideal size of a subchain is of 
oder of the domain size: ag 1 ^ 2 ~ R, where a is the size of a lattice bond. The number of 
such subchains in one domain is n ~ R d /g ~ R d ~ 2 . The loss of entropy upon melting of 
the domain is due to the fact that the both ends of each subchains are still fixed. The loss 
of entropy due to fixing of one end is of order \n((R/a) d ), so the overall loss of entropy per 
domain due to polymeric bonds is of order R d ~ 2 In R. 

Comparing this loss of entropy at large R with the surface energy cost one can notice that 
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at any dimension polymeric nature of the system is irrelevant . This shows that the described 
analogy with the RFIM is exact when the stability of the folded state with respect to small 
disorder in the energies of the native contacts is considered. Thus, one can immediately 
conclude that the lower critical dimension for protein folding is d c = 2, so that in three 
dimensions the folded state is stable with respect to small heterogeneity in the energies of 
the native contacts |§. Moreover, there must be some finite critical value of the heterogeneity 
S c above which the unfolding transition is not of the first order anymore and involves non- 
cooperative melting of domains. It should be noticed that in two dimensions even small 
disorder should destroy the first order folding transition, so that a precaution should be 
used in interpretation of lattice simulations on two-dimensional lattices. 

Up to now the stability of the folded state with respect to the disorder was analyzed. 
The analysis of the stability of the unfolded state deserves a separate consideration. The 
first step is to estimate the free energy of formation of one folded domain of size R. The 
surface energy cost and the statistical energy gain are the same as in the considered case of 
melting of a domain on the background of the folded state. However, the loss of entropy due 
to polymeric bonds is very different. Now the ends of loops of the length about N/n, where 
N is the length of the whole chain, are fixed (Fig. lb). The corresponding loss of entropy 
is of order n\n(N/n) ~ R d ~ 2 ln(N/R d ~ 2 ). It is seen that for large N the loss of entropy is 
very large compared to the surface energy cost, so that the unfolded state seems to be very 
stable with respect to disorder. 

In fact, this is not the case. The unfolded state is really stable with respect to a formation 
of one domain only. The situation changes drastically when formation of many domains 
simultaneously is considered (Fig. lc). In contrast to the RFIM where there is no interaction 
between such domains, in the present system polymeric bonds result in entropic interaction 
between domains. The accurate estimate of the loss of entropy per domain due to polymeric 
bonds in the case of many domains gives the same result as for the melting of a domain on 
the background of the folded state. As a result, the folded and unfolded states are stable 
under the same conditions. 
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It should be mentioned that all the obtained conclusions are based on some heuristic 
arguments of Imry-Ma type. The rigorous prove analogous to that of Bricmont and Kupi- 
ainen || for the RFIM is still needed. Meantime it makes sense to check the conclusions 
by means of simulation of a lattice model. A cubic lattice was taken together with the 



standard Monte Carlo method [IU| to simulate motion of a chain. The target conformation 
used in the simulation is shown in Fig. 2. This maximally compact conformation of a chain 
of N = 48 monomers has C max — 57 contacts. Correspondingly, C max random numbers eij 
with the Gaussian distribution with the zero average and unit dispersion were generated. 
Then we set b^ = b + 8eij. For each of the values of S the transition value of bt r ($) was 
estimated by means of long Monte Carlo simulations. The transition point was defined to 
have C max /2 native contacts on average. Then, another long (10 8 Monte Carlo steps) sim- 
ulation at b = b tr (5) was performed to collect the data for the histogram corresponding to 
the distribution of the number of the native contacts C. The result is presented in Fig. 3. 

It is clearly seen that at S = the distribution of the number of the native contacts 
is bimodal with two picks. One pick at C ~ 7 should be associated to the unfolded state 
and the other pick at C = C max should be associated to the folded state. Thus, the system 
exhibits the behavior typical for a first order phase transition. For small S the picture does 
not change qualitatively with increase in 5; still bimodal distribution is observed but the 
picks shift to each other. Finally, at 5 C ~ 1.7 the behavior changes qualitatively; at 5 > 5 C 
the distribution of C is monomodal with a maximum about C max /2 which implies that the 
folding transition is continuous. Thus, the results of the simulation of the lattice model are 
in accordance with the predictions of the heuristic arguments of Imry-Ma type. 

An obvious application of the obtained results is related to the protein design problem. In 
order to design a stable and rapidly folding protein one has to provide low energy of a target 



conformations relative to the other misfolded conformations. Two parameters, Z-score JTT 
and the ratio Tf/T g [0 were proposed as a quantitative measure of the design quality. Both 
of the parameters are based on the approximation of the protein conformational space by 
the Random Energy Model (REM) [|l^] and, in fact, within the REM approximation these 



parameters are monotonic functions of each other. More important, both of the parameters 
do not depend on the dispersion S of the energies of the native interactions and, therefore, 
they are not sensitive to the effects studied in the present letter. 

Thus, a design procedure based on Z-score or on the ratio Tf/T g results in some value 
of S = S . If S < S c , then the design procedure generates proteins having cooperative 
folding transition. However, if Sq > S c , then the design procedure generates proteins with 
thermodynamic domains. It should be noticed that in this case in order to stabilize the 
folded state one has decrease the temperature so that the folding rate becomes very slow. 
Interestingly, the design of model lattice proteins of moderate size ]13| based on Z-score 
actually produces chains with the thermodynamic domains which implies that Sq > S c in 
this model. Thus, in order to design a protein with cooperative folding transition one has to 
control value of S. An example of the improved design algorithm which tends to minimize 
S is studied elsewhere 
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Figure Captions 

Fig.l Stability with respect to formation of domains. Shaded areas correspond to the 
folded state, (a) Melting of a domain in the folded state, (b) Formation of a domain in the 
unfolded state, (c) Interaction between the formed domains in the unfolded state. 

Fig. 2 Target conformation of a chain of 48 monomers on a cubic lattice. The conforma- 
tion was obtain by folding of a homopolymer with attraction between its monomers. 

Fig. 3 Distribution of the number of the native contacts C at different heterogeneity 5. 
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